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I. INTRODUCTION 

It would be almost a banality to say that information permeates all human activity: from 
everyday life to science and engineering. No conscious action is made without some information 
received and processed. In spite of its overwhelming importance, the science that has information 
as its main subject is still lagging somewhat behind, with classical Information Theory being the 
only branch that is reasonably well developed at present time. Information Theory, as we know it 
now, is predominantly a theory of optimal transmission which by design is unconcerned about the 
content of the transmitted information. 

Summarizing the overall notion of information as it can be derived from experience, it would be 
not unreasonable to say that, at a high level of description, information can be generally character- 
ized by its quantity, accuracy and relevance. Moreover, these three characteristics cannot in general 
be reduced to each other and thus can be treated as independent. Clearly, while information quan- 
tity can be studied irrespective of its possible content, the latter has to be taken into account if 
one desires to describe both accuracy and relevance of information in question. Additionally, these 
two characteristics can only be sensibly discussed relative to a particular problem which an agent 
attempts to solve with the help of the information. In this context, roughly speaking, accuracy of 
the information measures the degree of agreement between the information conveyed (by a suitable 
information source) to the agent and the "true" state of affairs. The relevance, on the other hand, 
would measure the degree of influence the possession of (accurate) information of a particular kind 
can have on the solution quality for the problem in question - assuming the agent makes full use 
of it. If one looks at the classical Information Theory from this point of view, it can be said that 
it deals with information quantity while being explicitly indifferent to both its accuracy and rele- 
vance to any problem. In fact, it is also clear that the overall problem of information transmission 
(as opposed to information acquisition and usage) that the classical Information Theory addresses 
would in general not require any kinds of explicit reference to both accuracy and relevance of the 
information being transmitted. 

Pursuing the point of view described in the previous paragraph, one could also look at the 
overall problem of information study from the angle of a path information typically takes in any 
conscious act of making use of it. Namely, information is first acquired (from some kind of a 
source) then (possibly) transmitted and, finally, used to solve some problem. One can therefore 
speak of the three basic links of the full information chain (see Fig. [T]). In this picture, the classical 
Information Theory is the theory of the middle link. In most applications, the middle link has 
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FIG. 2. Full information chain and information attributes. 



the convenient property of being amenable to study in isolation of the two "end" links of the full 
information chain. On the other hand, it appears that the end links have to be studied together. 
Indeed, the problem being solved would in general dictate the particular kind of information that 
needs to be acquired from an information source. On the other hand, the "knowledge structure" 
of the particular source would typical affect the potential solution quality of the given problem. 

One can now connect the two points of view (the "information attributes" and "information 
path" ones) to arrive at the schematic picture illustrated in Fig. [21 With information quantity 
being the only attribute associated with the middle link of information chain, that link should 
allow for a "cleaner" treatment which was indeed undertaken in classical Information Theory. On 
the other hand, the two end links of the information chain are associated with information accuracy 
and relevance attributes, respectively (with information quantity also playing a role), and have to 
be studied simultaneously which potentially complicates the analysis necessitating, in particular, 
a joint consideration of problems and information sources as well as modeling the "knowledge 
structure" of the latter. 
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This article's main purpose is to initiate systematic study of the two end hnks of the full 
information chain. More specifically, we begin with the first (information acquisition) link, with 
the treatment of the third link to follow soon in future publications. Since, as was mentioned 
earlier, the two links are closely connected, the main practical results (such as algorithms for 
optimal information acquisition for the particular problem) will appear in these later publications. 
Still more specifically, the main subject of the present article has to do with ways of soliciting the 
specific information from an information source and with modeling the expected accuracy of the 
information obtained thereby. 

The discussion begins with a problem that the agent is willing to solve. To make it more specific, 
we assume that the problem is characterized by a well-defined real valued objective function that 
has to be minimized. Also, the agent is assumed to have only incomplete information about some 
parameters of the problem - hence the need for additional information. Incomplete information is 
described self-consistcntly via probabilities - therefore the initial information available to the agent 
is assumed to take the form of a certain probability measure on the problem parameter space which 
he or she looks to update by soliciting additional information from an information source. The 
latter process can be given the appropriate structure by enabling the agent to formulate specific 
requests for information, or questions, to the information source that are understood as descriptions 
of the specific kind of additional information the agent would want to obtain from the source. In 
turn, the source should be assumed to be able to provide appropriately structured answers to the 
agent's questions which fulfil the requests for information contained in them. The questions should 
in principle be selected to maximize the impact of the corresponding answers on the (appropriately 
measured) problem solution quality. To achieve this goal, the questions have to be selected in such 
a way that, on one hand, the source is able to provide accurate answers and, on the other hand, 
the specific information requested in the questions is relevant to the particular problem. While, 
as mentioned earlier, the issue of relevance will be treated in later publications, this article has 
accuracy as its main concern. To address the latter, note that, in general, a source would answer 
some questions more accurately than others. In other words, different questions appear to possess 
different degrees of difficulty to the source so that the more difficult questions are usually answered 
with lower accuracy. The agent's goal of finding questions that woTild be given accurate answers 
then becomes that of determining questions of sufficiently low difficulty to the source. 

Looking at this issue from a slightly different angle, one can say that if the agent's goal is 
selecting questions that are preferred from the point of view of accuracy of the answers the source 
can provide in response to them, then it would be convenient to be able to characterize each 
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question with a real number the value of which is directly related to the expected accuracy of the 
source's answer to this question. The requirement of characterizing each question with a single 
real number can be thought of as simply a design criterion that stems from the agent's desire to be 
able to completely order all questions with respect to the expected accuracy of the corresponding 
answers. 

In this article, we introduce a class of questions that can have some relevance to the problem of 
interest to the agent. We then proceed to developing the real valued measure of question difficulty 
- the question difficulty functional. To determine the form of the latter, it appears reasonable to 
begin with a set of requirements the functional has to satisfy. Such requirements can express both 
consistency properties and assumed symmetries of the "knowledge structure" of the source. The 
latter is understood as a quantitative description of the source's strengths and weaknesses in regards 
to being able to provide accurate answers to agent's requests for different types of information 
pertaining to the problem being solved. Depending on the degree of symmetry exhibited by the 
source's knowledge structure, one can obtain more or less elaborate forms of the question difficulty 
functional that require various geometric objects for its full description. It appears likely that 
there will arise a hierarchy of knowledge structure models so that a less complicated (e.g. more 
symmetrical) model can be obtained from a more complicated one as a limiting case (e.g. when 
additional symmetry becomes valid). The specific choice of a suitable model for the source would 
have to be made based on experience, with more complicated models replacing the simpler ones if 
sufficient evidence is found that the source's knowledge structure cannot be adequately described 
by the latter. In this article, we limit ourselves to a linear isotropic knowledge structure model 
postponing the discussion of more elaborate models to future publications. 



A. Related work 



This and the following papers can be described as an attempt to bring information theory to 
bear on optimization and decision making. As is well-known, the field of Information Theory that 
grew out of Shannon's pioneering work on communication theory has since had a profound im- 
pact on a number of disciplines in natural sciences and engineering. One of fundamental advances 
brought by Information Theory is the concept of entropy and mutual information that provide a 
natural and consistent measure of the amount of information associated with general probability 
distributions. Besides a revolution in communications which started from the demonstration that 
error-free transmission over imperfect channel was possible and gave rise to the modern coding 
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theory, the hst of successful apphcations of these concepts includes (but is not limited to) a simple 

riQ n 

derivation of statistical physics laws [1|, |2 , new algorithms in computer vision [3J, new methods of 
analysis in climatology J, la L physiology a] and neurophysiology [7]. The latter were based on the 
concept of transfer entropy [8(] which can interpreted as conditional mutual information defined for 
time series and that can be used to measure information transfer between different parts of complex 
systems. The relatively new field of Generalized Information Theory (see e.g. [9|) is concerned 
with problems of characterizing uncertainty in frameworks that are more general than classical 
probability such as Dempster-Shafer theory [l^. There it was shown, for example, that the min- 
imal uncertainty measure satisfying consistency requirements such as general subadditivity and 
additivity for combining uncertainty for independent subsystems is obtained by maximizing Shan- 
non entropy over all classical probability distributions consistent with the given belief specification 

In this paper, we use an axiomatic approach to determine the overall form of the question 
difficulty functional. In the context of classical Information Theory, the axiomatic approach was 
used, besides Shannon himself, in [l3] to derive the most general form of the entropy function. 
Later, Renyi used a different set of axioms [3] to find the one-parameter family of functions (later 
called Renyi entropies) that included standard (Shannon) entropy as a special case. The concept 
of structural entropy was introduced in [l^ and used for classification purposes. The entropy of 



15 1 (known as Havrda-Charvat entropy) was relatively recently obtained by axiomatic means in 
1£| where axiomatization of partition entropy was discussed on rather general grounds (see also 



171 ] for closely related work). It was shown in [16] that Shannon entropy, Havrda-Charvat entropy 



and Gini index all obtain as particular cases of general partition entropy that satisfies a system of 
reasonable axioms. 

A concerted application of information methods to problems of fundamental physics that goes 
back to pioneering work of Jaynes gave rise to the new field of Information Physics that 



produced a number of intriguing results in recent years (see 



18 ] for a good review). The overall 



guiding idea behind the field is that the laws of physics are in essence the laws of inductive inference 
applied to physical systems. That many of known physics laws can be derived in this way has been 
shown for thermodynamics [l|, quantum [13], classical mechanics, and (recently) relativistic 
quantum theory [21] The Information Physics approach shifts the emphasis in deriving laws of 
physics to the determination of the correct degrees of freedom and the relevant information that 
needs to be known for a full description of the system in question. A prototypical example would 
be an ideal gas in thermal equilibrium with a large bath. A full thermodynamical description of it 
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can be obtained with the knowledge that the gas consists of molecules and that it is characterized 
with a definite value of the average energy per molecule - the temperature. Once this is known, all 
measurable results can be obtained by an application of inductive inference rules in the form of the 
ME method that relies on maximization of relative entropy subject to the constraints expressing 
the relevant information. The ME method itself can be looked upon of as a generalization and 
refinement of Jaynes' original MaxEnt method. 

While the ME method addresses the problem of information processing in application to physical 
sciences, the issue of information quantification in Information Physics has been addressed from 



ideas of Cox 



22 



the point of view of order theory in 

of probability. It is argued in 



24l |. This particular research direction goes back to the 



22l | that order plays a more 



fundamental role than measure and that both probability and Shannon entropy can be derived as 
natural valuations on distributive lattices of assertions and questions, respectively. One of successful 
applications of the order-theoretic approach to fundamental physics is the recent derivation 28|] of 



Lorentz transformations and Minkowski metric of special relativity directly from the consideration 
of the partial order of events in space-time. 

The approach proposed in the present and follow-up articles is rather closely related to devel- 
opments in Information Physics. At a higher level, the proposed approach seeks to establish a 
general framework for information use in quantitative decision making in a wide variety of set- 
tings. Information Physics pursues a similar goal in application to physical sciences: it seeks to 
establish the general role of information in fundamental laws of nature. A bit more specifically, as 
mentioned earlier, while the classical Information Theory deals with information quantity. Infor- 
mation Physics is mostly concerned with information relevance for the given physical system. The 
proposed approach deals with information relevance for the problem being solved and information 
expected accuracy with respect to the "true" state. It should also be mentioned that the question 
difficulty functional developed in this article can be looked upon as a generalization of question 
relevance measure 3] discussed in 22-24| which quantifies the degree to which a partition ques- 
tion resolves the central issue (the most detailed partition question). In fact, we suspect that the 
difficulty functional of the present article can be derived using the order-theoretic approach of 



22l |. as a valuation on the lattice of questions that's allowed to depend, besides probabilities of 



corresponding assertions, on suitable geometric structures on the problem parameter space. This 
issue is currently under investigation. 

The main motivation for this and follow-up papers was the authors' desire to develop methodol- 
ogy for optimal use of (additional) information in decision making under uncertainty. This idea is 
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obviously not entirely new and it has been studied and used, for instance, in the area of statistical 
decision making. Applications to innovation adoption jsol. Is]]]. fashion decisions [s^] and vaccine 



composition decisions for flu immunization 



33t | can be mentioned in this regard. It's interesting to 



observe that the amount of information in these applications is typically measured simply as the 
number of relevant observations which can be either costless or costly, depending on the model. 
Some authors [s^, [sS] introduced various models (e.g. effective information model) for accounting 
for the actual, or effective, amount of information contained in the received observations. The 
common theme of this line of work is to try to find an optimal trade-off between the amount of 
additional information obtained and the suitably measured degree of achieving the original goal. 
Thus, for instance, in [33], waiting longer allows the decision makers to obtain more precise fore- 
cast of which flu virus strains are going to be predominant but leaves less time for actual vaccine 
production. The main difference of the approach initiated by this paper is in that it allows to op- 
timize not only the quantity of the acquired information but also its content and that it explicitly 
accounts for properties of information sources. 



Explicit consideration of information sources that lies at the core of the proposed methodology 
is similar in spirit to analyzing and using information provided by human experts. In fact, in many 
practically relevant application the role of multi-purpose information sources used in the proposed 
approach will likely be played by human experts. In existing research literature, the problem of 
optimal usage of information obtained from human experts has been addressed mostly in the form 
of updating the decision maker's beliefs given probability assessment from multiple experts 36h39||. 



and, in particular, optimal combining of expert opinions, including experts with incoherent and 



missing outputs 40(]. Investigations on using and combining information of experts that partition 
the event differently [41] and on rules of updating probabilities based on outcomes of partially 



similar events 



42] should also be mentioned in this regard. The latter investigations consider 



experts that provide qualitatively different information. The dependence of the quality of experts' 
output on the particular partition was also studied in [2]. In the approach developed in this and 
consecutive papers, the emphasis is on an explicit modeling of experts' knowledge structure and 
on optimizing on the particular type of information for the given expert (s) and the given decision 
making problem. 
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B. Outline 

The rest of the article is organized as follows. Section |II] contains necessary preliminaries includ- 
ing a short discussion of information relevance. In Section IIIH the adopted model of information 
acquisition is described and, in particular, a definition of a question to be used later is given. Sec- 
tion [IV] is devoted to a discussion of the question difficulty functional - the main topic of the present 
article. In particular, the main theorem establishing the overall form of the (isotropic) question 
difficulty functional required to satisfy certain reasonable postulates is proven. In Section IVj rela- 
tionships between different questions are explored. Section IVTl contains simple numerical examples 
illustrating the results obtained earlier in the article. Finally, a conclusion summarizing the main 
results is given in Section fVIII 

II. PRELIMINARIES 

As mentioned in section HI the proposed framework derives information relevance from (broadly 
understood) decision making problems: information is considered more relevant if it improves the 
corresponding decision quality to a larger extent. As the present article is devoted to information 
accuracy characteristic, we will consider the relevance aspect of information in detail in later 
publications. As far as the particular decision making problem is concerned, two aspects of it are 
important for our goals: the "uncertainty" space and the loss functional. The former supplies 
the description of both the initial information and the additional information provided by an 
information source. The latter quantifies the respective quality of the decision making problem 
solution and will be dealt with in later publications. 

A useful picture to have in mind while reading the rest of this article is that of an agent asking 
questions of an information source with the latter providing answers. The source is in general not 
capable of providing perfectly accurate answers to the agent's questions. One could say, informally, 
the the source is not 100% trustable. The source's answers can, in fact, can range from perfectly 
accurate to vacuous (some details are discussed in the next section). The question difficulty - 
the main subject of the present paper - is a useful construct that can help the agent predict the 
degree of the source's answer to a given question. As was mentioned in the Section HI it assigns 
to each question a real number that gives a degree of difficulty of the question to the source and, 
as such, determines the degree of (expected) accuracy of the source's answer to it. The detailed 
characterization of properties of the source's answers and its capabilities is given in the companion 
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paper [4j]. 

Let be the base space that is understood as the space of possible values of input data param- 
eters that are not known with certainty, for the problem the agent is solving. It is often referred 
to as a parameter space. 0, can be finite or infinite, such as a closed subset of a Euclidean space 
M'^. We denote by 3" a suitable sigma-algebra on ^1. P is a probability measure on (^2,3") that 
describes the initial state of information available to the agent and that can be modified by query- 
ing information sources. We often refer to it as a measure on Q, omitting an explicit specification 
of 3" unless needed. In order to formalize the process of extracting information from sources we 
will need to describe questions. The latter task necessitates the usage of various subsets of the 
parameter space Q, and collections of such subsets. 

A generic collection of (distinct) subsets C = {Ci,...,Cr} will be called inclusion-free if for 
any Cj, Cj E C neither of the two is a proper subset of the other. A collection C = {Ci, . . . , Cr} 
will be called complete if it fully covers Q, i.e. U^^^Cj = 

Of particular interest to us will be collections of subsets C that are partitions of il, meaning 
that all sets in C are non-overlapping, i.e. Cj fl C/ = for all j ^ I. Note that our definition of a 
partition differs somewhat from the standard one in that it does not require completeness which 
is expressed by the relation iJ-^^Cj = il. We refer to partitions that satisfy the completeness 
requirement as complete partitions and to those that do not satisfy it - as incomplete partitions. 
For any partition C, we will use the notation C = U[^^Cj. Clearly, partition C is complete if and 
only if C = il. 

A partition C will be called a refinement of C if every set from C is a subset of some set from C. 
In such a case, C is a coarsening of C. Given measure P on Vt, we call partition Cf(P) the finest 
partition of $7 associated with measure P if P{C) > for all C G Cf{P) and there exists at least 
one set of zero measure in any refinement of Cj(P). In case is a closed subset of a Euclidean 
space and 3" is a Borel algebra, it is easy to see that finest partitions do not exist if measure P 
has continuous support or has a component with continuous support. It is also clear that if the 
measure P has discrete support there exist many partitions of ft that are finest for P. 

Let C = {C[, . . . ,C^} and C" = {C{', . . . ,C^'} be two partitions of il. Then the partition 
C = C n C" is defined as the partition that consists of all sets of the form C^' n CJ: C n C" = 
{C[ n Cf, C[ n C2, . . . n CJ} (see Fig. [3] for an iUustration). Obviously, some of the sets 
constituting partition C D C" may be empty. Clearly, the partition C n C" is a refinement of both 
C and C". 

li D C 0, is a subset of and C = {C[, . . . ,C^} is a partition of ^l, the partition = 
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FIG. 3. Two partitions of 17 and the corresponding joint partition. 





C 

FIG. 4. Partition Cj^ of set £) C induced by a partition C of f2. 

{D n C(, . . . , D n C^} of Z) will be called the partition of D induced by the the partition C of O 
(see Fig. H]). 

Let C G 3" be a measurable subset of Vt. We denote by Pc the conditional measure on defined 

as 



Pc{D) 



pjDnC) 
P{c) ■ 



(1) 



for arbitrary Z) € 3". 

For an arbitrary complete partition C = {Ci, . . . , Cr-}, it is straightforward to show that the 
following decomposition of the measure P into the corresponding conditional measures 



p = Y.p^c,)Pc, 



(2) 



is valid. 
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III. INFORMATION ACQUISITION 

In the model of information acquisition we adopt, an information source is assumed to be 
capable of providing answers to questions. The accuracy of the source's answers generally varies 
with the question's nature: a given source can answer some questions more accurately than others. 
This difference in answer accuracy reflects the information source's knowledge structure which we 
model by assigning a real number to each possible question a source can receive. This number is 
naturally termed the question difficulty, the meaning of it being that the source can provide less 
accurate answers to questions of higher difficulty. Just like a question can be characterized with its 
difficulty, an answer to it can be characterized with its depth, with more accurate answers to the 
same question having a higher depth value. We address the question difficulty here, postponing a 



discussion of answer depth to a follow-up paper 



44| . First, we need to define what a question is. 



A. Questions 

A question is a request for new information over what is already known. The latter is represented 
by the probability measure P on the parameter space and is assumed to be common knowledge. 
One can think of $7 as a set of all possible states of the system in question. Let us assume, for 
simplicity, that J7 is a finite discrete set of cardinality N. This assumption is not going to be limiting 
as one can always apply a suitable discretization to a continuous parameter space fi. Since there 
is typically no order imposed on the elements u of 0,, they form an antichain. Then, as shown in 
22], the space of assertions over the elements of 0, can be represented as a Boolean lattice [45] Ai\f 



where the ordering relation < is identified with logical implication (x < y if and only if x implies y) 
and the lattice operations of join and meet are identified with logical disjunction and conjunction, 
respectively. 

rn 

A question was originally defined by Cox [27] as a set of logical assertions that answer it fully. 



Using this definition, one can construct 



221 ] a lattice of questions Q n from the Boolean lattice A at 



of logical assertions. The set of ideal questions - which, if ordered by set inclusion, forms a lattice 
isomorphic to ^Iat - is built from down-sets 3] of the elements oi Aj^. More generally, the 
set of all questions comprising Qtv can be formed by taking all distinct down-sets of subsets of ^Iat 
(see {22] for details). As was mentioned above, Qn naturally becomes a lattice where the ordering 
relation < is identified with "answering" (i.e. x < y if and only if any full answer to x also fully 
answers y). 
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It is easy to see from the construction outlined above that, in case when the system states form 
an antichain fi, questions can be identified with all possible inclusion-free collections of subsets 
of Q. In particular, ideal question are identified with individual subsets of Q. The central issue 
was defined in 



22l | as the down-set of the union of all atoms (logical assertions corresponding to 
individual elements of fi) An. All questions that lie above the central issue in Qjv (i-e. such 
questions that are fully answered by any logical assertion fully answering the central issue) were 
called real questions by Cox and also in [22]. The set of real questions form a sublattice of Qat, 
with the smallest real question (the bottom of the sublattice) being the central issue. It can easily 
be seen that, in terms of collections of subsets of il, all real questions are complete collections, with 
the cental issue being identified with the finest partition of fl. Note that only one ideal question 
- the one corresponding to the whole Q - belongs to the set of real questions. In the following, 
we will concentrate on questions corresponding to both complete and incomplete partitions of O 
which we refer to as complete partition questions and incomplete partition questions, respectively. 



All former ~ called simply partition questions by Knuth in [2^, |2j] - are real questions , and the 
latter belong to the set of vain questions - the complement of the set of real questions inside Qn ~ 
as defined by Cox. Note that incomplete partition questions will play a mostly auxiliary role, with 
all the practically important examples being of the complete partition - and hence real - variety. 
Thus we adopt the following definition. 

Definition: A question is a partition C = {Ci, C2, ■ ■ ■ , Cr} where Cj, j = 1, . . . ,r are subsets 
of n such that Ci n Cj = for i / j and U^=iCj C n. 

For any partition C we denote the union of all subsets in C by C: C = U^^^Cj. Thus, for any 
complete partition C, C = 

In everyday terms, a complete partition question (which is always real in Cox's terminology) can 
be interpreted as a traditional multiple-choice question (e.g. "Is the apple red, green or yellow?". 
Note that in most applications one would typically be dealing with questions that are above (ac- 
tually, quite often well above) the central issue in the Qat lattice. One can say that such questions 
do not request the fullest possible information about the system (problem) in question. For exam- 
ple, the question just mentioned does not request any information about the apple size. Observe 



that our definition of a question is close to that proposed in [49| since any (complete) partition of 
the base space C induces the corresponding probability distribution P{C) = (P(Ci), . . . ,P{Cr)) 
and thus the question C can be interpreted as a probability distribution which, in turn ~ since it 
represents the agent's incomplete information - can be thought of a request for missing information. 
Incomplete partition questions are typically more difficult to verbalize and interpret. Knuth 
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22[ | refers to them as auxiliary constructs similar to negative numbers that are not directly used 
for counting but can be very useful to assist it. We will discuss this issue in more details later in 
the paper, when the notion of an answer is introduced. 

Since the only types of questions we consider in this and follow-up articles are partition ques- 
tions (complete or incomplete), we will often use the terms "partition" and "question" interchange- 
ably. In particular, we will often say "complete question" and "incomplete question" to refer to 
a complete partition question and incomplete partition question, respectively. We will also refer 
to incomplete partition questions with partitions consisting of a single subset as ideal questions 
making use of the established terminology. 



B. Answers 

Recall that Cox [27l] defined a question as a collection of logical assertions that provide a (full) 
answer for it. Respectively, given a question an answer would have to be identified with any of such 
logical assertions. Note, in particular, that in this picture answers that provide more information 
than the question requests are perfectly admissible. For example, if the question reads "Is the 
apple green or not?", the answer stating that the apple is red would be a valid answer to the 
question. In this and following articles, we take a somewhat different view in which an answer 
to the given question provide no more than the requested information - as opposed to no less in 
Cox's picture. Informally speaking, a typical answer to the question mentioned above would be 
"The apple is more likely to be green than not", with different degrees of confidence in the stated 
assertion possible. In particular, one of key points in our approach lies in explicit consideration of 
imperfect answers, coupled with quantification of the degree of accuracy of such answers. 

While a detailed discussion of answers and their accuracy related properties will be given in 
the companion paper 



4J], we present some of it here since it is needed to clarify the notion of 
questions and, especially, incomplete - including ideal - questions. We begin with perfect answers. 

Definition: Given a question C = {Ci, . . . , Cr}, the perfect answer V*{C) is a message that 
takes one of the values in the set {si, . . . , Sr} such that a reception of the value s^ of the message 
has the effect of modifying the original probability measure P on to the measure P'^ such that 
P^{Cj) = 5k j and P^., = Pcj for all subsets Cj in C. 

Informally speaking, a perfect answer to C completely resolves the uncertainty associated with 
the partition C, i.e. places a random outcome w in one of the subsets in C with certainty but 
otherwise does no more (since it leaves the conditional measures Pq, unchanged). Let us now 
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generalize the definition of an answer to include possibilities of source errors. We assume, without 
loss of generality, that P{Cj) > for all subsets Cj in the partition C. 

Definition: An answer to the question C = {Ci, . . . , C^} is a message T^(C) that takes values 
in the set {si, S2, ■ ■ ■ , Sm} such that a reception of the value of the message updates the initial 
measure P on i7 to the measure such that either P^{Cj) = or P^, = Pc^ for all = 1, . . . m 
and all j = 1, . . . , r. 

The difference between these two definitions is in that the condition P^{Cj) = dkj that makes 
an answer perfect is not required in the general case. Moreover, the number of values a general 
answer can take does not have to be equal to the number of subsets in the partition C. Informally 
speaking, this describes the possibility of the source conveying different "gradations of belief" in 
various assertions. For instance, if the question is "Is it an apple, a pear or a peach?", the source 
could give an answer of the sort "Almost surely an apple." or, "More than likely an apple." or 
something of the kind. Alternatively, if the answer is not required or assumed to be perfect, the 
source could give just the traditional "assertion-type" answers, but the accuracy of them could 
be less than 100%, For example, if, according to the initial measure P, apple, pear and peach are 
equally likely, the probability that the fruit is really an apple given the answer "Apple." by the 
source could be found to be equal to 0.6. On the contrary, for a perfect answer to this question, 
the probability of the fruit being an apple following an answer "Apple. " by the source has to be 
equal to 1 by definition. 

It is straightforward to show for V{C) to be an answer to a complete question C according to 
this definition, it is necessary and sufficient for the updated measures P^, k = 1, . . . ,m, to take 
the form 

r 

P' = ^PkjPc„ (3) 
j=i 

where pkj, k = l,...,m, j = l,...,r are nonnegative coefficients such that Yl^j=iPkj = 1 for 
k = 1, . . . ,m. If the corresponding answer is perfect, pkj = Skj- In the following we will denote the 
probability that by answer V{C) takes the value - by Vk. 

If question C is complete and ^(C) is a corresponding answer, it is reasonable to assume that 
V{C) does not change the original measure P on average, or, in other words, that the original 
measure P is a "valid" one that only gets "refined" by an answer to C. Formally speaking, this 
assumption means that 

m 

Y.VkP' = P, (4) 

k=l 
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from which it follows, in particular, that if the answer is perfect, then vj = P{Cj). We refer to (|3|) 
as the consistency with prior condition for the answer V{C). 

Since incomplete and, in particular, ideal questions will be made use of - if in an auxiliary 
sense only - in the next section, we find it appropriate to discuss their interpretation in some more 
detail. In the following, we use the term correct answer for a question (complete or incomplete) to 
denote a full description of one of the subsets in the corresponding partitions. Thus, a question of 
the form C = {Ci, . . . ,Cr} has a total of r correct answers. A correct answer can be thought of as 
particular "state of the world" which becomes known only after the particular random outcome u; 
from the parameter space is observed. A correct answer to a question is in general uncertain at 
the time the agent need to make a decision, unless a source capable of producing a perfect answer 
to the corresponding question is available. Without excessive terminology abuse, one can say that 
a perfect answer and a correct answer relate to each other as a random variable and its particular 
realization. For example, if the question is "Is this fruit an apple, a pear, or a peach?" then 
"Apple", "Pear" and "Peach" are the possible correct answers, and a perfect answer is a message 
that can take three values such that a reception of each value of the message identifies the correct 
answer with certainty. 

This implies, for instance, that any ideal question has a unique correct answer. We interpret an 
ideal question as a real question conditioned on some correct answer value. For example, if a source 
is shown an apple then an ideal question can sound like "What kind of fruit is it?". Moreover, if 
an apple, a pear and a peach are the only three possible kinds of fruit that "exist" in $7, then a 
question "What kind of fruit is it?" is clearly a real question that always (in any "true state of 
the world") has a correct answer. But, for a given "state of the world" (correct answer) the same 
question can be considered an ideal question. One could say that a source capable of a perfect 
answer actually can identify which ideal question (out of the three possible) is being asked. The 
agent, on the other hand, at the time of formulating a question can only consider it as a real 
question. One therefore can say that an ideal question represents a certain "side" or "aspect" or 
a real question. Note in this regard that the same ideal question (described by the subset of Q 
corresponding to an apple) can play a role of an "aspect" of different real questions: for instance, 
"What kind of fruit is it?" and "Is it an apple or not?". As far as answers to ideal questions are 
concerned, we interpret them as "conditioned" answers to corresponding real questions. 

Let C = {Ci, . . . ,Cr} be a real partition question and let Ci be a particular ideal question. 
Suppose V{C) is some answer to C such that Vk = Pr(y(C) = Sk), k = 1, . . . ,m. Let us denote 
by q^^ the probabilities of different value of the corresponding answers to Q. Using the definition 
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of an ideal question as a "conditioned" real question and using the Bayes' rule we obtain 

=Pr(y(C) = CO = 1^. 

In particular, if the answer ^(C) is perfect, p^i = 6ki, and, as mentioned earlier, Vk = P{Ck)- 
Therefore, in this case, qf' = 6ki, or, in words, any ideal question C;, I = 1, . . . ,r, will have just 
one answer value si which identifies the corresponding (unique) correct answer. We see that, in this 
case (and only in this case), an ideal question receives just a single answer value from the source. 
In some sense, this is "not even a question" . But this is perfectly logical if the source is capable of 
providing a perfect answer to the corresponding real question: it is not "really a question" for the 
source since the source is good enough to be able to perfectly distinguish between all r subsets. To 
see one more illustrative example as to what ideal questions are (in our interpretation) , consider an 
instructor asking a student a multiple-choice (real) question: "Is it an apple, a pear or a peach?". 
In this situation, while both know what the real question is, only the instructor (by virtue of 
knowing what the correct answer is) knows which ideal question is being asked. For instance, the 
ideal question corresponding to an apple, (i.e. the ideal question a correct answer to which is 
"Apple") "exists" only when an apple is shown to the student by the instructor. In particular, all 
characteristics (such as probabilities of various answers q^p) of the student's answer to this ideal 
question are "oblivious" to what the students might say when the fruit shown to him is not an 
apple. 

This interpretation, in particular, allows us to obviate the need for using the logical absurdity 
element as a possible answer to ideal (and other incomplete) questions. Indeed, as explained above, 
an ideal question in this picture would have either a single answer (in the case when the source 
is capable of perfect answers to the real question one "aspect" of which is represented by the 
ideal question) or multiple answers some of which do not coincide with the ideal question's correct 
answer. 



IV. QUESTION DIFFICULTY 

We finally arrive to the point where we can introduce the notion of question difficulty. As 
was discussed in Section HI the main goal of introducing the question difficulty concept is to be 
able to predict the expected accuracy of the given information source's answers to any questions 
of the form described in the previous section. By its design, the question difficulty should be a 
real number that is assigned to any question and that depends, besides the question itself, on the 
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state of original information available and, possibly, on some parameters characterizing the source's 
knowledge structure which has be described by some construct related to the base space Thus, 
in general, the question difficulty is a real- valued functional on O which we denote by C, P). 

Our goal in this section is to derive a general form of this functional and - along the way 
- establish the set of parameters it can depend upon. As was discussed earlier, we proceed by 
formulating a set of requirements - of both consistency and symmetry variety - that the sought 
for functional would need to satisfy. As far as symmetry requirements are concerned, in this 
article, we consider isotropic models only, postponing the discussion of more general ones to future 
publications. We refer to all such requirement as postulates. 

As has been mentioned earlier, in the model adopted here, incomplete questions are to be 
understood as auxiliary constructions, while complete questions have a clear meaning. For an 
incomplete question C C $7, the difficulty functional G{Q, C, P) can be thought of as conditional 
difficulty of any complete question C containing the subset C given that the random outcome 
uj is in C. For example, if the subset Ci represents apple, C2 - pear and C3 - peach so that 
Ci U C2 U C3 = ri, then G(ri,Ci,P) can be interpreted as the difficulty of the question "Is it an 
apple, a pear, or a peach?", or, equivalently "What kind of fruit is it?" (since the source knows 
that the possible types are apple, pear and peach), provided the correct answer is "Apple". 

One reasonable and almost obvious requirement that can be imposed on the question difficulty 
functional G{Q.,C, P) is that of certainty, i.e. the difficulty of a question should vanish if there 
is no new knowledge to acquire given the original state of it. Formally speaking, G(ri,C,P) = 
whenever P{Cj) = 1 for some value of the index j. One can say that in this case the question is 
already answered at the time of its formulation. These are questions of the kind "Is this red apple 
red, green or yellow?". Thus we obtain 

Postulate Ql (Certainty). Suppose C = {Ci, . . . ,Cr} and P{Cj) = 1 for some value of j. 
Then G{n, C, P) = 0. 

The second postulate we propose requires that the question difficulty functional be continuous 
in all its arguments (which are yet to be determined). 

Postulate Q2 (Continuity). G{Q,C, P) is a continuous function of all its arguments. 

The next postulate states that, for incomplete questions that are not ideal, i.e. include several 
subsets, the difficulty is additive: the overall difficulty of the question is the sum of the difficulty 
of the ideal component and the difficulty of the complete questions that results when the ideal 
question was answered correctly. Formally, we obtain the following. 

Postulate Q3 {Incomplete question decomposition). Let C = {Ci, . . . ,Cr} be an incomplete 
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question. Then 



G{n, c, p) = G{n, c, p) + G{c, c, p^,). 



This postulate describes the difficulty of questions of the sort "What kind of fruit is it and is it 
red, green or yellow?", given that the correct answer to the first part of the question is "Apple". It 
states that the difficulty of the overall question is additive: it is equal to the sum of difficulties of 
two questions: "What kind of fruit is it?" (conditioned on "Apple being the correct answer") and 
Is this apple red, green or yellow?". Note that if the question C is complete, the ffist term on the 
right-hand side of the statement of Postulate Q3 vanishes by Postulate Ql and the statement of 
Postulate Q3 reduces to a trivial identity (since in this case (7 = fi). 

The next postulate states the mean value property of incomplete questions: the difficulty of 
the question C U C obtained by taking the union of two incomplete non-overlapping partitions C 
and C is equal to the arithmetic mean value of the difficulties of the constituents questions with 
respect to the original measure P. 

Postulate Q4 (Mean value). Let C and C be two incomplete questions such that C fl C" = 0. 



This postulate can be interpreted as follows. Let C and C each consist of a single subset: 
C = {C} and C = {C} for C C n, C C n. Assume also that CuC = n, so that {C, C'} is a 
complete question. Then the statement of Postulate Q4 would read 



which is consistent with the interpretation of the difficulty G{0,, C, P) of an incomplete question 
as that of a complete question containing G as one of options conditioned on G being true (i.e. 
conditioned on a; € G). For instance, let G represent an apple and C" a pear and assume the 
these are the only two possible kind of fruit. Then expression ([5|) states that the difficulty of the 
question "What kind of fruit is it?" is equal to the average of difficulty of the same question over 
all possible correct answers. From this point of view. Postulate Q4 sounds rather natural and 
generic. But the real meaning of Postulate Q4 is in that it states that the conditional difficulties 
are independent of the number and measures of other options (subsets). Postulate Q4 assigns the 
same conditional difficulty G{^},G, P) to the subset C C regardless of the complete partition it 
is a member of. For instance, if C C represents an apple then, given that the fruit is really an 



Then 



G{n,cuc',p) 



p{c)Gin, c, p) + p{c')G{n, c, p) 
p{c u C') 



G{n, {G, G'}, P) = P{G)G{n, G, P) + P{G')G{n, G', P), 



(5) 
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apple, the difficulty of the question "Is it an apple or not?" would be the same as that of "What 
kind of a fruit is it?" even if the number of possible choices (kinds of fruit) is large. It is easy to 
see that this is, while not unreasonable, still may be a rather strong assumption which may not 
be true for realistic information sources (especially human experts). Postulate Q4 can be thought 
of as an expression of linearity of the difficulty functional and it can be expected to be relaxed or 
modified in more general models. 

To state the next postulate we need to introduce a new notion. We say that the parameter space 
is homogeneous if the question difficulty functional depends only on its subset measures for any 
question C in J^: G(J^,C,P) = /(P(C)) where P(C) stands for the vector (P(Ci), . . . , P(C^)). 
More generally, we say that a subset D C $7 is homogeneous if G(D, C, Pd) = f{PD{C)) as long as 
C C D. In particular, any atom (minimal set) of the sigma-algebra 9^ is homogeneous. Postulate 5 
then states that the difficulty of an incomplete question does not depend on how it is approached 
(directly or in stages) as long as all the intermediate questions lie inside a homogeneous subset of 
the parameter space. 

Postulate Q5 {Homogeneous incomplete sequentiality) . Let D C be a homogeneous subset 
of the parameter space and let C be a question such that C C D. Then 

G{n, C, P) = G{n, D, P) + G{D, C, Pd). 

To get a little more "feci" for this postulate think of a question asking to identify a certain animal 
species. The gradual approach to such a question would involve asking intermediate questions about 
the class the animal belongs to, order, suborder, superfamily, family, and, finally, the species itself. 
In case the original question is of "harder than average" variety it would be easier to answer the 
question in stages compared to answering it right away. On the other hand, if the original question 
is an easy one (easier than other similar questions) it can be easier to answer it without resorting 
to the intermediate "guiding" questions. A good example of the latter would be a question about 
a domestic cat that an average person would be able to answer easily and correctly whereas the 
"guiding" questions about class, order etc. would likely present some difficulty. Respectively, if all 
such questions are equally hard (for the same measure) then it would make sense to believe that 
the intermediate "guiding" questions would not change the difficulty of the original question just 
like the postulate states. 

Note also that Postulate Q5 looks very similar to Postulate Q3 which does not require any 
homogeneity to be valid. The main difference is that the second term in the right-hand side of 
the main statement of Postulate Q5 involves a difficulty of an incomplete question (since C ^ D) 
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whereas the corresponding term in the statement of Postulate Q3 is required to describe a difficulty 
of a complete question with the base space C. 

Finally, it certainly makes sense to require that if D C O is homogeneous and C C D then 
f{PD{C)) = G{D,C, Pd) should be a decreasing function of its argument Pd{C). Indeed, an 
incomplete question about something "rare" should be more difficult. We thus obtain Postulate 6. 

Postulate Q6 {Homogeneous incomplete monotonicity) . Suppose C O is homogeneous and 
C C D. Then /{PoiC)) = G{D, C, Pd) is a decreasing function of its argument PoiC). 

In order to get still more insight into the proposed set of postulates for the question difficulty 
functional consider the following alternative postulate. 

Postulate Q3' {Complete question sequentiality) . Let C = {Ci, . . . , C^} be a complete question 
and let C be its refinement. Then 

r 

G{n, c, p) = G{n, c, p) + ^ P{Cj)G{Cj, Cc, , Pc, )• 

Postulate Q3' states that if a complete question is made more detailed the difiicTilty of the 
resulting question can be obtained as a sum of the difficulty of the original question and the 
average (with respect to the measure P) of difficulties of conditional detalizations. For instance if 
the original question was "Is it an apple or a pear?" and the detalization sounds like "Is it an apple 
or a pear and is its color red, green or yellow?" then Postulate 3' says that the difficulty of the 
detailed question is equal to the difficulty of the original question plus the average of difficulties of 
questions "Is this apple red, green or yellow?" and the question "Is this pear red, green or yellow?". 
This postulate may seem to be somewhat more reasonable and grounded in experience compared 
to, for instance, the Mean value postulate. It turns out though that Postulate Q3' is implied by 
Postulate Q3 and Postulate Q4 as the following lemma shows. 

Lemma 1 Suppose Postulate Q3 and Postulate Q4 hold. Then Postulate Q3' holds as well. 
Proof: Let C be a refinement of C = {Ci, . . . , C^}. Then we can write 

r 

G{n,C,P) J]P(C,-)G(0,Cc,,P) 

r 

= E P(Cj){G{n, Cj,P) + G{Cj, Cc,, Pc,)) 

r r 

= J2 P{Cj)G{n, C„P) + J2 P{Cj)G{C„Cc, , Pc,) 

r 

G{n,C,P) + J2PiCj)G{Cj,Cc„Pcj), 
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where (a) follows from the Postulate Q4 since C = U^^^Cc^. , (b) follows from Postulate Q3 and 
(c) follows from Postulate Q4. □ 

Thus we see that Postulates Q3 and Q4 can be regarded as a somewhat stronger version of the 
complete question sequentiality property expressed by Postulate Q3'. 

If we now demand that Postulates Ql through Q6 hold for the question difficulty functional 
G{0,, C, P), the question for us is what form this functional can possibly take. The answer is given 
in the following theorem. 

Theorem 1 Let the function G{il, C, P) where C = {Ci, . . . , Cr} satisfy Postulates Ql through 
Q6. Then it has the form 

E,^=i^(c,)P(c,)iog^ 



G(fi,C,P) 



Jq u{ui) dP(u)) 

where u{Cj) = — ^ P(c ) '^^^ n.- ^2 — >■ M is an integrable nonnegative function on the parameter 

space 

Proof: Let A = {Ai, . . . , Am} be a (complete and sufficiently fine) partition of 0. We can 
assume, without loss of generality, that the sigma-algebra 9" on is comprised of all unions of sets 
in A. 

Let D C 0, he a homogeneous subset of the parameter space and let C C D be an ideal question 
lying inside of D. Furthermore, let C" C C be another question inside of C. Then, according to 
Postulate Q5, 

G{n,C,P) = G{n,D,P) + G{D,C,PD), (6) 
and, since C is homogeneous as well, 

G{D,G',Pd) = G{D,C,Pd) + G{G,G',Pc). (7) 
Using the form of G(-) for homogeneous subsets, and that Pc{C') = -^^j, we obtain from 

f{PD{C')) = fiPoiC)) + /(Pd(C')/Pd(C)), 

from which it follows using standard additivity arguments, monotonicity and continuity of the 
function /(•) (which follow from Postulates Q6 and Q2, respectively) that /(x) = —clog a; where 
c > is a constant (see jl4| ] for details). Since the constant c may depend on the particular 



homogeneous subset D we can denote it by u{D) and obtain that 

G{D,G,PD) = -u{D)logPD{C), (8) 
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for any C C D whenever D is homogeneous. 
Substituting ([8]) into ([6]) we can obtain 

Gin, c, p) = Gin, D, p) + fiPic)/PiD)) = Gin, d, p) - uiD) log ^1^, 

or, equivalently, 

Gin, G, P) - Gin, D, P) = -uiD) log P(C) - uiD) log P(L>), (9) 

where G is an arbitrary subset of D. Then it follows from ^ and continuity of the function G 
(Postulate Q2) that 

Gin,G,P) = -uiD)\ogPiG)+viD), (10) 

for any G D whenever D is a homogeneous subset of n. Here viD) is an arbitrary function of D. 
Setting -P(C) = 1 in (jlOp and making use of Postulate Ql, we obtain that viD) = and therefore 

Gin, G, P) = -uiD) log PiG). (11) 

Now let D = {Di, . . . ,Diy} be a complete partition of n into homogeneous subsets Dj, j = 
1,. . . ,N. Let C C be an incomplete ideal question. Then G = UjLiG fl Dj, and since Dj is 
homogeneous and G Ci Dj C Dj , we obtain using (fTTIl that 

Gin,GnDj,P) = -uiD j) log P iG r\Dj). (12) 

On the other hand, by Postulate Q3, 

Gin, G, P) = Gin, Dc, P) - GiG, Dc, Pc), (13) 

where 

1 ^ 

Gin, Dc, P) = -— V uiD,)PiG n D,) log PiG nD,), (14) 
(using the identity G = U^^^C fl Dj, expression and Postulate Q4), and analogously. 

Substituting (fHl) and (fT5]) into (fT3|) we obtain 

G(o, c, p) = - ^ prefix ^(-^^-^ 

.7 = 1 
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We can rewrite p6j) as 

Gin, C, P) = -u{C)P{C) log (17) 

where 



can be thought of as the definition of function it: 3~ ^ M for inhomogeneous subsets of VL. If we 
define the function u[uj) on by 

N 

where Id{oj) is the indicator function of a subset D C 0, then the expression (jlSh can be written 



as 



^(C^) = p(^) ^ ^ ■ (19) 
Finahy, if C = {Ci, . . . , C,.} is an arbitrary question, we can use (fT7|) and Postulate Q4 to obtain 

where the "weights" u{Cj) of the subsets Cj are given by (fT9]l . □ 

Theorem [1] establishes the general form of the question difficulty functional if isotropy and 
linearity conditions are imposed. The result depends on the measure P and an integrable function 
u{-) on the parameter space that can be thought of as a description of the information source's 
knowledge structure. Note that while the measure is extensive, i.e. the measure of a union of two 
disjoint subsets of 9. is the sum of individual measures (P(C U C) = P{C) + P{C') if C n C" = 0), 
the function u represents an intensive quantity in that it averages for a union of two disjoint subsets 
(n(C U C) = ^^'^'^^(c)+p(c')^^'^ ^ )• '■^^^ loosely speaking, that while measure is similar 

to volume, u is similar to temperature if parallels with physics are to be used. These parallels 
suggest that the function u{uj) can be thought of as temperature-like quantity that is allowed to 
be different at different points of the parameter space. In the following, we refer to the function 
u{uj) as intensity or pseudotemperature. For the same reason, as mentioned earlier, it is convenient 
to think of question difficulty as the amount of pseudoenergy associated with the question. 

It is also convenient to introduce the notion of entropy of question C in the usual way as Shannon 
entropy of the probability distribution induced by the partition C: 
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which differs from the pseudoenergy (difficulty) in that it does not involve the pseudotemperature 
li(-). It is easy to see that, for an ideal question C C il, the relationship between pseudoenergy 
and entropy is simply 

G{n, C, P) = u{C)H{n, C, P), 

that is identical to the relationship that exists between thermal energy (heat) and entropy in 
thermodynamics for reversible processes. 

The form of the question difficulty functional, as has been just mentioned, is determined up 
to an arbitrary non- negative integrable function on the base space il. This function has the 
meaning of a description of the knowledge structure of a given information source and therefore 
is in general different for different information sources. If an agent is faced with an "unfamiliar" 
information source, he or she would in principle need to estimate the source's knowledge structure 
expressed by its pseudotemperature function. This function would describe the source's strengths 
and weaknesses with regards to its ability of producing accurate answers to various questions. The 
only practical way of estimating the source's pseudotemperature function u{-) is by "probing" the 
source's knowledge [s^ asking the source some "sample" questions and comparing the answers to 
actual outcomes once the latter become available. The specific methods for estimating the function 



li(-) are discussed in the companion paper 



Observe also that, if one multiplied the function u{uj) by any positive constant, the difficulty 
of any question would be multiplied by the same constant. This means u{uj) is defined up to an 
overall scale that is equivalent to a choice of pseudoenergy measuring units. This overall scale can 
be thus chosen according to a convention of the agent's choice. One such convention that we will 
use is that the pseudotemperature is normalized so that its average value on the whole base space 
is unity: 

/ u{Lo)dP{Lo) = 1. 
Jn 

One useful property of this convention is that, if the pseudotemperature is constant on Q, i.e. the 
whole 0, is homogeneous, pseudoenergy coincides with entropy and can be measured in the familiar 
binary "bits". Another useful convention is described in the companion paper 44 1. 



V. RELATIONSHIPS BETWEEN DIFFERENT QUESTIONS 

In this section, we assume that all questions are complete. If C and C" are two arbitrary 
(complete) questions, the expression Y^^^i^f^, P{C')G{C' ,Cq,, Pc) will be denoted G{Q,,C'^,, P) 
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and called the conditional difficulty, or, equivalently, conditional pseudoenergy of C". Using this 
notation, the sequentiality property expressed by Postulate Q3' can be rewritten as 

Gin, c, p) = G{n, c, p) + G{n, Cc,p), (20) 

where C is an arbitrary refinement of C. 

If C and C" are two arbitrary (complete) questions and C = C C" then obviously C is a 
refinement of both C' and C". One can then write the sequentiality property (|20p as 

G{n, c, p) = G{n, a, p) + G{n, CcP). (21) 

But it is easy to see that the partition induced by C = C fi C" on any set C' in C' is exactly the 
same as the partition induced on that set by C" . Therefore, the term G{Q,Cc' , P) in (j2ip can 
be equivalently written as G{Q,, C'^,, P) and we arrive at the chain rule for the question difficulty 
which we formulate lemma. 

Lemma 2 // C and C" are two arbitrary complete questions and P is a measure on Q then 

G{n,c' nc",p) = G{n,c',p) + G{n,c'l.,,p). 

Again, let C and C" be two (complete) questions on Q and let C = C n C" be the resulting 
combined question. Then the pseudoenergy overlap J(il, (C; C"), P) between C and C" can be 
defined as the difference between the sum of difficulties of C and C" and that of the combined 
question C n C": 

(C; c"), P) = G{n, C, p) + G{n, c", p) - G{n, C n c", p) (22) 

The definition (f22|) can be illustrated by a Venn diagram (see Fig. [5]). Note that J(il, (C; C"), P) is 
symmetric with respect to C and C" . From the point of view of the distributive lattice of questions, 
C = C n C" is the question representing the meet of questions C and C". The join of these 
questions will in general be not a partition question but instead a question that can be associated 
with a collection of overlapping subsets of il. The pseudoenergy overlap J(il, (C; C"), P) can then 
be identified with the difficulty of the join of C and C". The relation (|22p then becomes nothing 
else but the sum rule for valuations and bi-valuations on lattices. 

One can make use of the sequentiality property of pseudoenergy to rewrite expression for the 
pseudoenergy overlap as follows. 

(C; c"),p) = G{n, c, p) + G{n, c", p) - G{n, (C, c"),p) 

= G{n, c, p) + G{n, c", p) - G{n, c", p) - G{n, c'c„, p) 
= G{n,c',p) -G{n,c'c>>,P). 
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FIG. 5. Venn diagram for pseudoenergy overlap. 



We formulate this result as a lemma. 

Lemma 3 IfC and C" are two arbitrary questions and P is a measure on il. then the pseudoenergy 
overlap J{^1, (C; C"),P) can be found as 

j{n,{c';C"),p) = G{n,c',p) -G{n,c'c>>,P). 

Clearly, due to symmetry, the expression for the pseudoenergy overlap stated in Lemma [3] can 
be equivalents written as J{n, (C; C"),P) = G{n, C", P) - G{n, C'^,,P). 

If an expression for the pseudoenergy overlap as a function of the measure P and the pseu- 
dotemperature u{uj) is desired the definition (f22]l together with Theorem [T] can be used to obtain 

r' r" PfC" n C") 

J{n, (C; C"),P) = E E ^(^^ n G'J)P{Cl n GJ) log • (23) 

We will be interested in exploring relationships between different questions: given two distinct 
questions, we would like to know to what degree they are similar to each other. More specifically, 
if a perfect answer to one question is available, how the difficulty of the other question is affected. 
To answer this question, let C and C" be two arbitrary complete questions on and let V*{C') 
be a perfect answer to C. We would like to find an expression for the conditional difficulty of C" 
given V*{C'). Clearly, since a reception of value s'- of V'(C') updates the measure P to P^'., the 
difficulty of C" given V{G') = s'j is equal to 

G{n, C", Pc'^ ) = G{C^ , C'^, ,Pc'^), (24) 
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since subsets of zero measure do not contribute to the difficulty. Therefore the overall (expected) 
difficulty G(r2, C", V*{C')) of question C" given a perfect answer V*{C') to C can be written as 

r' 

G{n,c",v*{c')) = j;pr(y*(c') = sj)G{n,c",Pc'^) 

Piq)G{c!,,c^,,Pc^) = Gin, c'i„p) ^^^^ 

^=^G(J],C",P)- J(f),(C';C"),P), 

where (a) follows from (|24p and the consistency condition @ - which implies that Pr(y*(C') = 
Sj) = P{Gj); (b) follows from Lemma [3l 

We see from (j25p that the conditional difficulty of C" can be represented as a difference of 
the standard (unconditional) difficulty and the pseudoenergy overlap J{fl, (C; C"),P). Thus the 
latter provides a measure of reduction of difficulty of a question that is due to a perfect knowledge 
of an answer to another question. Such a measure can naturally be termed relative depth jsil] of 
an answer V{C') (which in general may not be perfect) with respect to question C". We can 
formulate the result just obtained lemma. 

Lemma 4 The relative depth of a perfect answer V*{C') to question C with respect to question 
C" is equal to the pseudoenergy overlap between questions C and C" . 

The result of Lemma H] has a clear intuitive interpretation: If two distinct questions are close, 
i.e. "almost about the same thing" then knowing a (perfect) answer to one of them nearly answers 
the other one - reduces the difficulty of it to a small value compared to the initial difficulty. The 
pseudoenergy overlap quantifies the notion of closeness for two arbitrary questions. 



VI. EXAMPLES 



We consider an example with a finite parameter space first. Let Q consist of 8 elements, 
corresponding to green, yellow and red apples (denoted GA, YA and RA, respectively), green, 
yellow and red pears (denoted GPr, YPr and RPr), and yellow and red peaches (denoted YPc 
and RPc). Let all elements be equiprobable so that P(-) = | for all a; € 0. The function n(a;) 
describes the relative difficulty of respective ideal questions. To this effect, let us suppose that 



it was found (say, using estimation procedures described in [4J|) that u{GA) = u{GPr) = 1 



uiyPr) = u{RPr) = 1.5 and uiYA) = u{RA) = u(yPc) = u{RPc) = 2. Normahzing the values 
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oiu{-) so that J^u{uj)dP{uj) = 1 one obtains u{GA) = u{GPr) = ^, u{YPr) = u{RPr) = i| and 
u{YA) = u{RA) = u{YPc) = u{RPc) = if. 

The difficulties of ideal questions corresponding to individual elements of can be found as 
follows: G(n,GA,P) = G{n,GPr,P) = ^-logS = f|, G{n,YPr,P) = G{n,RPr,P) = {|-log8 = 
fl and G{n,YA,P) = G{n,RA,P) = G{n,YPc,P) = G{n,RPc,P) = if • log 8 = f|. The 
difficulty of the exhaustive question (that asks to determine the type and color of the fruit presented 
to the source) can be found as an expectation of the difficulties of all these ideal questions. Denoting 
the corresponding (finest) partition of J7 by C/ we obtain 

G{n,Cf,P) = ^P(a;)G(J^,a;,P) = 3. 

Now let us consider difficulties of other complete questions. Let first of such questions be "Is the 
fruit green or not?". Let Cg = {GA,GPr} C CI he the subset consisting of all green fruit (apples 
and pears) and let Cg = Q,\Cg he the subset containing fruit of all other colors (red and yellow) . 
The values u(-) for the sets in this partition are u{Cg) = ^ and u{Cg) = 5'if + §'if = ||- The 
measures are P{Cg) = \ and P{Cg) = |. Thus the difficulty of the question "Is the fruit green or 
not?" can be found as 

G{n, {Cg,Cg}, P) = u{Cg)P{Cg) lOg + u(C g) P (C g) lOg = 0.66 

Consider another question with subset measures (and thus "metric elaborateness") equal to 
those of {Cg,Cg}. The question is "Is the fruit a peach or not?". The corresponding partition is 

{Cpc,Cpc} where Cpc = {YPc, RPc} and Cpc = ^\ Cpc- The values of function u{-) on these 
subsets are u{Cpc) = jf and u{Cpc) = 5'^ + |"T| + |"T| = ii- The measures are P{Cpc) = \ 
and P{Cpc) = f . The difficulty of the question {Cpc,Cpc} is 

G(Q, {Cpe, Cpe}, P) = u{Cpc)P{Cpc) log — ^ + u{Cp^)P(Cpc) log — ^ = 0.90 

We sec that this question is somewhat more difficult than the question on whether the fruit 
is green. The main reason for this difference is that to answer the question on whether the fruit 
is a peach one might need to have to tell a peach from an apple of similar (warm) color which is 
relatively difficult while answering the question on whether the fruit is green does not involve any 
"hard" decisions since the color itself is distinct. 

Consider now the question "What color is the given fruit?" on one hand and "What type is 
the given fruit?" on the other. The former question can be represented as the partition Cc = 
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{Cg,Cy,Cr} where Cg = {GA,GPr}, Cy = {YA,YPr,YPc} and Cr = {RA, RPr, RPc}; the lat- 
ter question can be identified with the partition Ct = {Ca, Cpr, Cpc} where Ca = {GA, YA, RA}, 
Cpr = {GPr,Y Pr, RPr} and Cpc = {YPc,RPc}. The values of u(-) on these subsets are 

u{Cpr) = I • ^ + 1 • tI = §, u{Gpc) = if. The measures are P{Cg) = i, P{Cy) = |, P{Cr) = |; 
P{Ga) = P{Cpr) = |, P{Gpc) = \- Thus the difficulties of these two questions are 

Gin, {Cg, Gy, Gr}, P) = u{Gg)P{Gg) lOg — ^ + u{G y) P {G y) lOg ■ ^ 



and 



P{Gg) ^ ^ -p{Gy) 

+ u(a)p(a)iog^ = lliog^ + Aiog4 = i.5i, 



G(f], {Ca, Cp„ Gpc}, P) = u{Ga)P{Ga) log — ^ + n(Cp,)P(Cp,) log ■ 



P{Ga) ' ^" ' ^" °P{Gpr) 
+ u{Gp.)P{Gp.) log ^ = 1 log ^ + 1 log 4 = 1.60, 

respectively. 

The question about color turns out to be slightly easier than that about type. Qualitatively, 
the main reason for this difference is that the relatively rare event (that the fruit is green and that 
it is a peach, respectively) that gives a larger contribution to the difficulty because of the log -p^ 
factor has smaller average value of pseudotemperature u{-) in the case of the question about the 
fruit color. 

The pseudoenergy overlap between the "color" and "type" questions can be calculated using 
the expression (p3|) : 

J{n, (C; Ct), P) = A log ^ + ^ log ^ = 0.100, 

indicating that while a perfect knowledge of the fruit color helps answering the question about its 
type, the reduction of difficulty of the "type" question due to the knowledge of color is relatively 
mild so the question about the fruit type remains almost as hard as it was before the color became 
known. 

For an example with infinite parameter space, consider = [0, 1]^ with uniform measure P 
(see Fig. [6] for an illustration). Let u{ijj) = |(wf +1^2) where oji and uj2 are coordinates on Q,. 
Let us consider three different questions: Cj = {Cj,Ci}, where Gi = {uj : (jJi ^ [|,l],tJ2 £ 
G2 = {oj : ooi ^ [0, ^],ci;2 € [0,^]}, C3 = : wi G [0, ^],a;2 G [^,1]}- It is easy to see that 
P(C7i) = ifori = l,2,3. 
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FIG. 6. The parameter space = [0, 1]^ and subsets Ci, i — 1,2,3. 

For question Ci, we have u{Ci) = | /i duji Ji du2{uJi +(^2) — i- Then, using the normahzation 
condition u{Ci)P{Ci) + u{Ci)P{Ci) = 1, we can obtain n(Ci) = |, which allows us to compute 
the difficulty: 

G{n, {Ci, Ci}, P) = u{C^)P{Ci) log + n(Ci)P(Ci) log j^-^ = i log 1 + i log4 = 1.208. 

i_ 1 

For question C2, we obtain n(C2) = | dcoi Jq duj2{oo\ + a;|) = ^, and, making use of the 
normalization condition, u{C2) = f^- The difficulty functional value for this question becomes 

G{^,{C2,C2],P)=u{C2)P{C2)\0g-^+u{C2)P{C2)\0g ^ 



P{C2 



= log - = 0.415. 

P{C2) 3 



Finally, for question C3, we have u{C^) = | doji fi du!2{uji+uj2) — ^1 ^'^d, obviously, ^(Cs) = 
1. The difficulty functional is 

G{Q, {C3, C3}, P) = u{Cs)P{Gs) log — ^ + u(C,)P{C3) log — ^ = | log 1 + i log 4 = 0.811. 

^(,03j P[G3) 4^4 

We see that, among these three questions Ci turns out to be the most difficult while difficulty 
of C2 is the smallest of the three. The reason is that Ci includes a small measure (rare) set in 
the region of high values of pseudotemperature u{uj). On the other hand, the rare subset in C2 is 
located in the region of small values of u{u}). Question C3 is naturally placed between these two 
extremes: its rare subset is located in the region of moderate values of the field u{uj) so that the 
difficulty weight of this subset is equal to the average for the whole parameter space. 
The overlaps between these questions can easily be computed using expression (f23]l . 

J{n, (Ci; C2), P) = i log ^ + ^ log ^ = 0.123, 



J{n, (Ci; C3), P) = log ^ + -| log ^ = 0.232, 
lb 3 lb 9 
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and 

Jin, (C2; C3), P) = ^ log ^ + log ^ = 0-013, 

showing that the most difficult questions - Ci and C3 - also exhibit the largest overlap which 
agrees with the common sense derived notion that a knowledge of a perfect answer to a more 
difficult question can give more help in answering another question. 

It is interesting to consider the limit in which the measure of the rare set approaches zero. For 
this purpose, let Ci = {u : ui E [1 — a, 1],C02 £ — o-, I]}, C2 = {co : uji E [0,a\,uj2 £ [0, a]} and 
C3 = {w : wi e [0,a],co2 e [l-a,l]} and let Q = {Ci,Ci} (oi i = 1,2,3. Let u{u;) = ^(w" + ^2) 
where n > 2 is an integer and w G $7 = [0, 1]^. Then repeating the calculations for the previously 
considered example, taking the limit a — >■ 00 and retaining only terms of the lowest order in a we 
obtain 

G{n,{Ci,Ci},P) ~ (n + l)a2 log^ + loge-a^ ~ (n + l)a2log^, 
G(J1,{C2,C2},P) ~loge-a2, 

and 

G{n, {C3, C3}, P) ~ 2a^ log - + log e • ~ 2a^ log -. 

a a 

Again, we can see that the question Ci ends up being the most difficult one, with C2 being the least 
difficult. It's interesting to note that, to leading order in a, the difficulty of Ci and C3 behaves as 
aP' log ^ (with only a numerical coefficient being different), while the difficulty of C2 behaves as a^. 
A related observation is that, in this limit, the difficulty of both Ci and C3 is dominated by the 
rare subset while that of C 2 is dominated by the larger subset with measure approaching 1 since 
the contribution of the rare subset is diminished by the low value of pseudotemperature u{-) over 
that subset. 

VII. CONCLUSION 

This article initiated development of a general quantitative framework for the description of the 
process of information extraction from information sources capable of providing answers to given 
questions. The main motivation for such a framework is the need for optimal decision making 
in situations characterized with incomplete information and availability of additional information 
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sources. The framework is expected to be especially useful when the knowledge the information 
sources possess is of a relatively "loose" variety, i.e. cannot be readily represented in a form 
admitting direct use in a mathematical formulations. A typical example of such a source would be 
a human expert who can express a preference for one of the two regions in the parameter space but 
would find it difficult to produce an accurate probability distribution over the parameter space. 

The main components of the proposed framework are questions, answers and information 
sources. The present article's subject is questions and, in particular, question difficulty func- 
tional. The purpose of the latter is measuring the degree of expected accuracy the given source 
can achieve answering various questions. The idea is that a source would answer easy question 
well but its answers' accuracy would decrease with increasing difficulty of questions. The overall 
form of the question difficulty functional is in general determined by the constraints the difficulty 
functional is required to satisfy. In this article we made an assumption that the question difficulty 
is linear and isotropic on the parameter space. The resulting form was then derived from a system 
of postulates expressing the desired properties along with more general consistency requirements. 

It turns out that the resulting question difficulty functional depends on a single scalar quantity 
u{-) defined on the parameter space and can be interpreted - using parallels with thermodynamics 
— as an energy-like quantity while the function u{-) takes on the role of temperature that is allowed 
to take different values at different points of the parameter space. It is interesting to contrast the 
resulting difficulty functional to the corresponding Shannon entropy that can be interpreted as a 
quantity measuring the minimum expected number of bits required to communicate a (perfect) 
answer to the question under consideration. Using parallels with thermodynamics, one can say 
that, while the former is akin to thermal energy, the latter can be likened to entropy. 

It is worth noting, as was alluded to in SectionlH that the system of postulates used in the present 
article is somewhat restrictive in that it has the isotropy of the source's knowledge structure "built 
in". For instance, the proposed system of postulates implicitly assumes that an ideal question 
difficulty is a well-defined quantity, regardless of the real question it is "an aspect of" (in the sense 
explained in Section IIIip . The consequence of that is the resulting source knowledge structure - and 
hence the form of the difficulty functional - is described by a scalar function on the base space. On 
the other hand, imagine, for example, an information source that can answer questions about fruit 
color a lot better than those about its kind. In this case, the source's knowledge structure would 
exhibit a pronounced directional dependence. Preliminary investigations show that such knowledge 
structures can be captured if the system of postulates is weakened somewhat to exclude implicit 
assumptions mentioned above. In such cases, the difficulty functional can be shown to depend on 
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an arbitrary rank-2 tensor as opposed to a scalar function in the isotropic case described in the 
present article. Details will appear in future publications. 

Note also that while we used a "traditional style" axiomatic approach to determine the form 
of the question difficulty functional, we believe that it can be derived along the lines of the more 

ready mentioned in the Introduction. 



221 is a natural bi- valuation on the 



recent order-theoretic approach described in [22, |2j] as was a 
Specifically, the question relevance measure introduced in 
distributive lattice of questions defined as down-sets of all subsets of the elements of the lattice 
of logical assertions. The partial order on the lattice of questions is then given by set inclusion 
which can be interpreted as "answering" . The relevance (which, we believe, would be better called 
"bearing" as originally suggested by Cox) was defined in [2^, [3] as a bi- valuation on the sublattice 
of real questions that gives the degree to which a real question resolves the central issue. Since 
partition (complete partition in our terminology) questions are the join-irreducible elements of the 



521 ]. Suppose now one wants the question 



real lattice, their valuations can be assigned arbitrarily 
valuations to be related to the valuations on the corresponding lattice of logical assertions. Then 
if one makes a single assumption that the valuation for a partition question depends only on 
probabilities of assertions (subsets of in our interpretation) constituting this question, then the 
constraints imposed by the lattice structure lead to the unique form of the valuation equal (up 
to one multiplicative and one additive constant) to the Shannon entropy. On the other hand, if 
one instead assumes that the question valuation depends - besides the subset probabilities that 
encode the initial information about the system - on some geometric object on the base space fl 
that describes the particular information source's knowledge structure, we believe that one would 
recover the difficulty functional. The nature of the geometric object would then be determined by 
symmetry considerations. This issue is currently under investigation. 
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